Myocardial scar and left ventricular ejection fraction classification for electrocardiography image using multi-task deep learning

Myocardial scar (MS) and left ventricular ejection fraction (LVEF) are vital cardiovascular parameters, conventionally determined using cardiac magnetic resonance (CMR). However, given the high cost and limited availability of CMR in resource-constrained settings, electrocardiograms (ECGs) are a cost-effective alternative. We developed computer vision-based multi-task deep learning models to analyze 12-lead ECG 2D images, predicting MS and LVEF < 50%. Our dataset comprises 14,052 ECGs with clinical features, utilizing ground truth labels from CMR. Our top-performing model achieved AUC values of 0.838 (95% CI 0.812–0.862) for MS and 0.939 (95% CI 0.921–0.954) for LVEF < 50% classification, outperforming cardiologists. Moreover, MS predictions in a prevalence-specific test dataset recorded an AUC of 0.812 (95% CI 0.810–0.814). Extracted 1D signals from ECG images yielded inferior performance, compared to the 2D approach. In conclusion, our results demonstrate the potential of computer-based MS and LVEF < 50% classification from ECG scan images in clinical screening offering a cost-effective alternative to CMR.


Cardiac Magnetic Resonance (CMR) imaging protocols & parameters
CMR imaging was performed using a Gyroscan NT Intera 1.5-Tesla Philips scanner (Philips Medical Systems, Best, the Netherlands) during 2009-2016, and using an Ingenia 3.0T MR system (Philips Medical Systems) during 2017-2021.
For Delayed-enhancement cardiovascular magnetic resonance (DE-CMR) images, a dose of 0.15 mmol per kg of body weight of gadolinium-based MRI contrast agent [gadoterate meglumine (Dotarem; Guerbet, Paris, France), gadobutrol (Gadovist; Bayer Healthcare, Leverkusen, Germany), or gadopentetate dimeglumine (Magnevist; Bayer Healthcare)] was injected intravenously.The inversion time was adjusted to the null normal myocardium.The viability study was assessed in the short-axis, 2-chamber, and 4-chamber views at 10 minutes after intravenous administration.The images were acquired using a 3-dimensional segmented gradient echo and inversion-recovery sequence with the following parameters: 1.25 ms echo time, 4.1 ms repetition time, 15° flip angle, 303 × 384-mm field of view, 240 × 256 matrix, 1.26 × 1.5 mm in-plane resolution, 8 mm slice thickness, and 1.5 sensitivity encoding factor.
Left ventricular ejection fraction (LVEF) was assessed using left ventricular end-systolic and end-diastolic volumes calculated from multiple-slice short-axis images.Delayed-enhancement cardiovascular magnetic resonance (DE-CMR) images were interpreted to determine the location and pattern of the myocardial scar (MS).CMR image analysis was performed on the ISP workstation (IntelliSpace Portal 9.0, Philips Healthcare, Best, the Netherlands).
The 17-segment model, as recommended by the American Heart Association 1 and with the exception of segment 17, was used.Delayed-enhancement images were read while blinded to the ECG results.

Electrocardiogram (ECG) data preprocessing
We preprocessed raw portable document format (PDF) of full ECG reports into 12-lead ECG scan images using a process consisting of three main steps.First, we converted the PDF file into an image file for each ECG report using the pdf2image library in Python. 2 We then isolated the region containing the ECG signals and removed the gridlines by filtering out shades of red in the new format.Finally -to normalize all of the ECGs, we rearranged the ECG leads into two vertical columns (6 rows × 2 columns).The first column contains lead I, II, III, aVR, aVL and aVF, while the second column contains V1, V2, V3, V4, V5, and V6 (Figure 1B).We also cropped the signal regions of the old-format ECGs to normalize the number of QRS pulses to approximately the same number of QRS pulses shown in the new-format ECGs.Both voltage scale and time scale are the same across the ECGs within the same format.However, scales are not the same between old and new formats.This might enable the models to not rely on the actual pixel size of the signal on images but focus on the relative size of signals within each format.The average processing times for old-format and new-format ECG are 0.389 and 0.514 seconds, respectively.Processing the new-format ECG takes slightly longer due to the presence of the gridlines and the following removal process.We note that this was measured on AMD Ryzen 9 Mobile 4900H Laptop CPUs.

Model architecture
The model uses ResNet34d 3 as the core backbone architecture.We used the resized image of size 384 x 384 pixels with augmentation in various aspects, including blurring, brightness, and contrast.The intermediate representation has a size of 512, which is forwarded to two separate dense classification heads -one for the MS classification and the other for the LVEF range classification.We developed the models using the PyTorch framework (PyTorch Foundation, Wilmington, Delaware) and Python (Python Software Foundation, Beaverton, Oregon).

Incorporating clinical features into the models
This model processes the ECG image through the ResNet34d backbone and processes the clinical features using a recurrent neural network (RNN)-based backbone before using the output from both to predict MS and LVEF range.
Embedding layers and RNN layers were shown to boost classification performance when used together as a feature extractor 4 , and outperformed methods relying on multilayer perceptrons (MLP).In this study, we used a bidirectional RNN (BRNN) to represent the clinical features and to concatenate them to the intermediate layer for prediction.Categorical clinical features are transformed into a 512-sized embedding vector via an embedding layer, and the only numerical feature (age) is normalized and concatenated to the embedding vector.The combined vector then migrates to the BRNN layer to create a summary vector (Figure 2D).ECG and clinical representations are then concatenated before predictions are made.
In addition, we trained a similar single-task model with both image and clinical features as inputs for each task (Figure 2E).This further helps verify the effectiveness of training the model in a multi-task manner over training two separate single-task models where clinical features are also utilized.

Training strategy
All of the models developed in this study were trained for 30 epochs using an AdamW optimizer 5 with an initial learning rate of 0.0005 that was scheduled to reduce linearly over the training steps.A simple early stopping strategy was also adopted in which the training terminates once the validation loss plateaus for five epochs.The transferred model was trained using old-format ECGs for ten epochs, after which it was transferred to learn using new-format ECGs for another ten epochs.

Interpretation process by cardiologists
One experienced and one in training cardiologists were recruited and independently reviewed and interpreted ECGs from new-format test datasets.The interpretation was based on criteria for prior or silent/unrecognized myocardial infarction 6 Old-format ECGs from 2009 to 2014 were used for training, while old-format ECGs from 2015 and 2016 were used as development and test sets, respectively.New-format ECGs from 2017, and 2019 to 2021 were used for training.New-format ECGs from 2018 to 2019 were used as a development set, and new-format ECGs from 2018 were used as a test set.

Table 1 .2 Model performance evaluation of MS classification when using the new-format test set Model New-format test set (N=1,264)
7sing the abnormal Q wave definition (Q or QS wave abnormality [Q/QS] [code 1.1, 1.2]) published inThe Minnesota Code Manual of Electrocardiographic Findings.7